OCR villains
From MobileRead
This page lists many of the typical OCR errors found when proof reading a book. Some of these can be found with spell checking and a few more with grammar checking programs but some will just need a keen eye. In some cases you can search through the document and replace the ones that don't belong.
Contents |
[edit] Numbers, symbols and letters
0 <--> O {zero <--> Uppercase o}
1 l I i ! <--> each other
{digit One, lowercase L, uppercase i, lowercase i, exclamation mark}
2 <--> Z
5 <--> S
6 <--> uppercase G
7 <--> ? {question mark}
7 and / = I {uppercase I in italic}
] = J
square bracket = uppercase J
]ane = Jane
[edit] letters
e <--> c
are <--> arc
cl <---> d
clock <--> dock
close <--> dose
f ligatures confusion
ff, fi, fl, ffi
h <--> b
back <--> hack
harrow <--> barrow
H = ll
weH = well
H or h = li
Hbrary = library
hke = like
hn = lm
ahnost = almost
j <--> J {lowercase <--> uppercase J }
jane = Jane
Jury = jury
rn <--> m
Mom <--> Morn
stem <--> stern
earnest = camest {this also had the e=c combo}
modem = modern
corner = comer
ri <--> n
arid <--> and
r = f
ringers = fingers
m <--> in
stein <--> stem
rmg = ring
inoth = moth
im <--> un
unport = import
imdone = undone
n <--> u
bnt = but
teut = tent
uest = nest
ii = u
iinder = under
B <--> R {uppercase}
DEABEST = DEAREST
Robby <--> Bobby
F <--> P {uppercase}
Full <--> Pull
ih = th
feaiher = feather
di = th {weird, but it happens a lot}
die = the
tii = th
tiie = the
tli = th
tlie = the
Tm == "I'm (also with no leading quote)
T = I {uppercase i}
U = double ell, li, il
WeU = Well
Ufe = life
untU = until
vv = w
vvhen = when
\V = W
y <--> v
yery = very
verv = very
[edit] Punctuation errors
/' = ," or .” {or single quote}
* = quote mark
** *' '*
'' = " {two single quotes, should be a double quote}
Space following opening quote mark
Space preceding closing quote or punctuation mark.
He did this ; then he did that ; then he said : “ You aren’t ready ! ”
Apostrophe goes missing, stranding the last letter
I m = I’m, don t = don’t, Bob s = Bob’s
These following often occur with a "Smarten Punctuation" action:
Backward quote marks:
” close quote at start of paragraph
“ open quote at end of paragraph
Reversed single and double quotes in nested quotations:
“And I said to him, ‘Quit that!”’
‘“O what a tangled web we weave,’” she said.
’ Right single quote should replace "straight" apostrophe, not ‘ Left single quote.
Happens often at start of a word:
‘em should be ’em, ‘tis should be ’tis
- hyphenation problems. The source has hyphens when the word breaks at the end of a line
but the hyphen is left in when the document reflows. (A search can usually find these.)
) with a space in front. Sometime ( will have a space after it. Search for these.